Skip to content
This repository has been archived by the owner on Oct 18, 2021. It is now read-only.

Signal #39

Closed
wants to merge 2 commits into from
Closed

Signal #39

wants to merge 2 commits into from

Conversation

bartvm
Copy link
Owner

@bartvm bartvm commented Feb 18, 2016

Continuation of #38 but requires mila-iqia/platoon#61; this makes sure that if you press CTRL-C or if the cluster sends a SIGTERM signal the workers finish their batch and save their parameters before quitting.

For EASGD they generally set beta and use it calculate alpha as a
function of the communication period and the number of workers. Because
the controller is agnostic to the number of workers connected we must
pass it explicitly.

In order to plot the logs of multiple workers together we should log the
actual time (instead of the relative time).

The sorting of batches seems to lead to some strange training artifacts;
sometimes one of the workers gets significantly longer sentences than
the other for a while, which makes it look like it's doing really badly.
Moreover, the sentences sampled are always of the same length if
sample_freq divides the batch buffer. To guarantee that this doesn't
affect training too much I shuffled the batches. There are still some
artifacts (one worker gets shorter/longer sentnces for a while) it
reduces the noise a bit. In general it might be better to monitor
cost/target length instead of just the cost.
This was referenced Feb 22, 2016
@bartvm bartvm closed this Feb 24, 2016
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant